Ewing
Limits of Emergent Reasoning of Large Language Models in Agentic Frameworks for Deterministic Games
Su, Chris, Li, Harrison, Marques, Matheus, Flint, George, Zhu, Kevin, Dev, Sunishchal
Recent work reports that Large Reasoning Models (LRMs) undergo a collapse in performance on solving puzzles beyond certain perplexity thresholds. In subsequent discourse, questions have arisen as to whether the nature of the task muddles an evaluation of true reasoning. One potential confound is the requirement that the model keep track of the state space on its own. We provide a large language model (LLM) with an environment interface for Tower of Hanoi problems, allowing it to make a move with a tool call, provide written justification, observe the resulting state space, and reprompt itself for the next move. We observe that access to an environment interface does not delay or eradicate performance collapse. Furthermore, LLM-parameterized policy analysis reveals increasing divergence from both optimal policies and uniformly random policies, suggesting that the model exhibits mode-like collapse at each level of complexity, and that performance is dependent upon whether the mode reflects the correct solution for the problem. We suggest that a similar phenomena might take place in LRMs.
- Asia > Vietnam > Hanoi > Hanoi (0.26)
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > New Jersey > Mercer County > Ewing (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)
A novel sensitivity analysis method for agent-based models stratifies in-silico tumor spheroid simulations
Rohr, Edward H., Nardini, John T.
Agent-based models (ABMs) are widely used in biology to understand how individual actions scale into emergent population behavior. Modelers employ sensitivity analysis (SA) algorithms to quantify input parameters' impact on model outputs, however, it is hard to perform SA for ABMs due to their computational and complex nature. In this work, we develop the Simulate, Summarize, Reduce, Cluster, and Analyze (SSRCA) methodology, a machine-learning based pipeline designed to facilitate SA for ABMs. In particular, SSRCA can achieve the following tasks for ABMS: 1) identify sensitive model parameters, 2) reveal common output model patterns, and 3) determine which input parameter values generate these patterns. We use an example ABM of tumor spheroid growth to showcase how SSRCA provides similar SA results to the popular Sobol' Method while also identifying four common patterns from the ABM and the parameter regions that generate these outputs. This analysis could streamline data-driven tasks, such as parameter estimation, for ABMs by reducing parameter space. While we highlight these results with an ABM on tumor spheroid formation, the SSRCA methodology is broadly applicable to biological ABMs.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > New Jersey > Mercer County > Ewing (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Energy (0.67)
NJ lawmaker calls on Department of Defense to 'immediately' probe mystery drone sightings
New Jersey State Senator Jon Bramnick joins'America Reports' to discuss recent mysterious drone sightings in New Jersey. A New Jersey state senator is calling on the Department of Defense to investigate the recent mysterious nighttime drone sightings amid rising public frustration over a lack of answers. "Let me be clear: The state police, this is way beyond their expertise … We know the Department of Defense has the technology to monitor these drones," State Sen. Jon Bramnick, R-N.J., told co-anchor John Roberts Wednesday on "America Reports." "The problem is we don't have the Department of Defense in New Jersey at this time. And that's what I call for. Until the Department of Defense comes in, shuts down airspace completely to drones, do a limited state of emergency – no drones in the sky until we figure out what's going on here," Bramnick warned.
- North America > United States > New Jersey > Monmouth County (0.07)
- North America > United States > New Jersey > Ocean County (0.05)
- North America > United States > New Jersey > Mercer County > Ewing (0.05)
- North America > United States > New Jersey > Essex County > Newark (0.05)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Military (1.00)
TrajDiffuse: A Conditional Diffusion Model for Environment-Aware Trajectory Prediction
Qingze, null, Liu, null, Li, Danrui, Sohn, Samuel S., Yoon, Sejong, Kapadia, Mubbasir, Pavlovic, Vladimir
Accurate prediction of human or vehicle trajectories with good diversity that captures their stochastic nature is an essential task for many applications. However, many trajectory prediction models produce unreasonable trajectory samples that focus on improving diversity or accuracy while neglecting other key requirements, such as collision avoidance with the surrounding environment. In this work, we propose TrajDiffuse, a planning-based trajectory prediction method using a novel guided conditional diffusion model. We form the trajectory prediction problem as a denoising impaint task and design a map-based guidance term for the diffusion process. TrajDiffuse is able to generate trajectory predictions that match or exceed the accuracy and diversity of the SOTA, while adhering almost perfectly to environmental constraints. We demonstrate the utility of our model through experiments on the nuScenes and PFSD datasets and provide an extensive benchmark analysis against the SOTA methods.
- North America > United States > New Jersey > Mercer County > Ewing (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Automatic Logical Forms improve fidelity in Table-to-Text generation
Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria (0.04)
- Oceania > Fiji (0.04)
- (31 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Early Forecasting of Text Classification Accuracy and F-Measure with Active Learning
Orth, Thomas, Bloodgood, Michael
When creating text classification systems, one of the major bottlenecks is the annotation of training data. Active learning has been proposed to address this bottleneck using stopping methods to minimize the cost of data annotation. An important capability for improving the utility of stopping methods is to effectively forecast the performance of the text classification models. Forecasting can be done through the use of logarithmic models regressed on some portion of the data as learning is progressing. A critical unexplored question is what portion of the data is needed for accurate forecasting. There is a tension, where it is desirable to use less data so that the forecast can be made earlier, which is more useful, versus it being desirable to use more data, so that the forecast can be more accurate. We find that when using active learning it is even more important to generate forecasts earlier so as to make them more useful and not waste annotation effort. We investigate the difference in forecasting difficulty when using accuracy and F-measure as the text classification system performance metrics and we find that F-measure is more difficult to forecast. We conduct experiments on seven text classification datasets in different semantic domains with different characteristics and with three different base machine learning algorithms. We find that forecasting is easiest for decision tree learning, moderate for Support Vector Machines, and most difficult for neural networks.
- North America > United States > New Jersey > Mercer County > Ewing (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- (11 more...)
Stopping Active Learning based on Predicted Change of F Measure for Text Classification
Altschuler, Michael, Bloodgood, Michael
Abstract--During active learning, an effective stopping method allows users to limit the number of annotations, which is cost effective. In this paper, a new stopping method called Predicted Change of F Measure will be introduced that attempts to provide the users an estimate of how much performance of the model is changing at each iteration. This stopping method can be applied with any base learner. This method is useful for reducing the data annotation bottleneck encountered when building text classification systems. I. INTRODUCTION The use of active learning to train machine learning models has been used as a way to reduce annotation costs for text and speech processing applications [1], [2], [3], [4], [5]. Active learning has been shown to have a particularly large potential for reducing annotation cost for text classification [6], [7]. Text classification is one of the most important fields in semantic computing and it has been used in many applications [8], [9], [10], [11], [12]. A. Active Learning Active learning is a form of machine learning that gives the model the ability to select the data on which it wants to learn from and to choose when to end the process of training. In active learning, the model is first provided a small batch of annotated data to be trained on. Then, in each following iteration, the model selects a small batch and removes this batch from a large unlabeled set of examples.
- North America > United States > New Jersey > Mercer County > Ewing (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (10 more...)
The Use of Unlabeled Data versus Labeled Data for Stopping Active Learning for Text Classification
Beatty, Garrett, Kochis, Ethan, Bloodgood, Michael
Abstract-- Annotation of training data is the major bottleneck in the creation of text classification systems. Active learning is a commonly used technique to reduce the amount of training data one needs to label. A crucial aspect of active learning is determining when to stop labeling data. Three potential sources for informing when to stop active learning are an additional labeled set of data, an unlabeled set of data, and the training data that is labeled during the process of active learning. To date, no one has compared and contrasted the advantages and disadvantages of stopping methods based on these three information sources. We find that stopping methods that use unlabeled data are more effective than methods that use labeled data. I. INTRODUCTION The use of active learning to train machine learning models has been used as a way to reduce annotation costs for text and speech processing applications [1], [2], [3], [4], [5]. Active learning has been shown to have a particularly large potential for reducing annotation cost for text classification [6], [7]. Text classification is one of the most important fields in semantic computing and it has been used in many applications [8], [9], [10], [11], [12].
- North America > United States > New Jersey > Mercer County > Ewing (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (10 more...)
The History of Robots: From the 400 BC Archytas to the Boston Dynamics' Robot Dog
Robots have fascinated and preoccupied human minds for centuries - from ancient tales of stone golems, to modern science fiction. Though the word "robot" was only officially penned in 1921 by Karel Čapek, mankind has endeavored to create autonomous machines since as far back as the 4th Century BCE. Today, robots are widely used across a variety of industries, aiding in the manufacturing of vehicles and more. According to the International Federation of Robotics, in 2015 there were as many as 1.63 million industrial robots in operation worldwide, and that number continues to grow steadily each year. Here's a brief history of how robotics have evolved and grown from the early imaginings of 400 BCE, to the global resource they are today. The earliest beginnings of robotics can be traced back to Ancient Greece. Aristotle was one of the first great thinkers to consider automated tools, and how these tools would affect society at large.
- Europe > Greece (0.24)
- North America > United States > New Jersey > Mercer County > Ewing (0.04)
- Europe > Germany (0.04)
- (2 more...)
- Health & Medicine (0.70)
- Government (0.69)
- Leisure & Entertainment > Games > Chess (0.32)
Impact of Batch Size on Stopping Active Learning for Text Classification
Beatty, Garrett, Kochis, Ethan, Bloodgood, Michael
When using active learning, smaller batch sizes are typically more efficient from a learning efficiency perspective. However, in practice due to speed and human annotator considerations, the use of larger batch sizes is necessary. While past work has shown that larger batch sizes decrease learning efficiency from a learning curve perspective, it remains an open question how batch size impacts methods for stopping active learning. We find that large batch sizes degrade the performance of a leading stopping method over and above the degradation that results from reduced learning efficiency. We analyze this degradation and find that it can be mitigated by changing the window size parameter of how many past iterations of learning are taken into account when making the stopping decision. We find that when using larger batch sizes, stopping methods are more effective when smaller window sizes are used.
- North America > United States > New Jersey > Mercer County > Ewing (0.17)
- North America > United States > California > Orange County > Laguna Hills (0.15)
- North America > United States > Colorado > Boulder County > Boulder (0.05)
- (4 more...)